Giant panda age recognition based on a facial image deep learning system

Abstract The conservation of the giant panda (Ailuropoda melanoleuca), as an iconic vulnerable species, has received great attention in the past few decades. As an important part of the giant panda population survey, the age distribution of giant pandas can not only provide useful instruction but also verify the effectiveness of conservation measures. The current methods for determining the age groups of giant pandas are mainly based on the size and length of giant panda feces and the bite value of intact bamboo in the feces, or in the case of a skeleton, through the wear of molars and the growth line of teeth. These methods have certain flaws that limit their applications. In this study, we developed a deep learning method to study age group classification based on facial images of captive giant pandas and achieved an accuracy of 85.99% on EfficientNet. The experimental results show that the faces of giant pandas contain some age information, which mainly concentrated between the eyes of giant pandas. In addition, the results also indicate that it is feasible to identify the age groups of giant pandas through the analysis of facial images.


| INTRODUC TI ON
As a rare and vulnerable animal endemic to China and a flagship species for wildlife conservation, the giant panda has an extremely high research value and important conservation significance (Zhang et al., 2007). The giant panda is considered an umbrella species as the conservation strategies to protect it and its habitat benefits sympatric species, such as the takin (Budorcas taxicolor), golden snub-nosed monkey (Rhinopithecus roxellana), crested ibis (Nipponia nippon), dwarf musk deer (Moschus berezovskii), red panda (Ailurus fulgens), and blood pheasant (Ithaginis cruentus) (Li & Pimm, 2016). In addition to wildlife, giant panda conservation also protects the ecosystem functions that are closely related to the survival of hundreds of millions of people in the Yangze river basin (Kang et al., 2013;Pimm et al., 2018;Wang et al., 2022;Wei et al., 2018).
To conserve giant pandas in the wild, the Chinese government and international organizations, such as World Wide Fund for Nature (WWF), have carried out numerous studies on the population structure using several census methods including route survey, biological population horizontal density, distance discrimination, and distance-bite discrimination (Tang et al., 2015). At present, according to the results of the fourth national survey of giant pandas completed in 2014, the population of wild giant pandas has grown steadily, the habitat has been significantly expanded, and the conservation and management capabilities have gradually increased compared with the results of the previous three surveys (Tang et al., 2015). As a result of these improvements, the giant panda was downgraded from "endangered" to "vulnerable" by the International Union for the Conservation of Nature (IUCN) in 2017 (Swaisgood et al., 2016). However, despite significant conservation management success, fragmentation of the giant panda habitat remains a major threat to its survival, and some local populations still face survival risks. Furthermore, despite improvements in conservation methods, all previous national surveys have failed to obtain a comprehensive and accurate age structure of the wild panda population (Tang et al., 2015). Understanding the age structure of a population, particularly small, isolated populations, is very important for understanding the stability and reproductive potential of that population.
As different age groups have a large impact on the birth rate and mortality of a population, research on the age distribution of different giant panda populations can monitor and even predict population dynamics in real-time (Gong et al., 2012).
In recent years, age estimation, as an emerging biometric recognition technique, has become a research focus in computer vision. It has been widely used in areas such as age-based humancomputer interaction, commercial law enforcement, and biometrics (Jain et al., 2004;Panis et al., 2016;Pinter et al., 2017;Raval & Shankar, 2015). Usually, age estimation is considered a multiclassification or regression problem. In the early face age estimation studies, most of the studies used hand-designed face age-related features (Guo et al., 2009;Lowe, 1999) and then estimated face age by classification or regression models. However, the design of features and the choice of learning methods often require rich prior knowledge, which deeply affects the performance of age estimation. In recent years, deep convolutional neural networks (CNN) have gained importance in the field of machine learning and pattern recognition due to their good feature extraction capabilities.
Several works started to experiment with deep learning models for age classification or regression. Levi and Hassncer (2015) learn representations by deep convolutional neural networks for automatic age and gender classification. Niu et al. (2016) treat age estimation as an ordered regression problem and solve the ordinal regression problem by an end-to-end learning method using CNN. Zhang et al. (2019) investigate the limitations of small-scale image compression models and propose a compact and efficient contextbased cascaded age estimation model (C3AE). Zang et al. (2021) propose the LERep model for age estimation to solve the model for applications on low-power devices. With the development of computer technology, researchers in the field of animal biometric recognition have started to conduct research on species recognition based on phenotypic appearance using computer technology (Kühl & Burghardt, 2013;Polzounov et al., 2016;Gomez Villa et al., 2017;Norouzzadeh et al., 2018;Li et al., 2019). Similarly, methods used for face recognition with humans have been applied to wildlife.  (Kumar et al., 2017).
Currently, the method of classification for different age groups of giant pandas is still mainly based on biological methods, such as the size of giant panda feces and the length of intact bamboo found in feces, which is used to determine the different bite values for giant pandas (Hu, 2011). Additionally, the wear of molars and the growth line of teeth may be used to determine the age of a giant panda skeleton; however, this is not useful for determining the current age structure of a population (Wei et al., 2011). Although these methods of estimating the age groups of giant pandas are simple, they are impractical because they require a significant amount of difficult fieldwork to collect fecal samples, which is time-consuming and labor-intensive as well as having high-risk factors, high collection costs, and poor overall accuracy (Zhan et al., 2009). Therefore, new objective and reliable techniques are needed to effectively estimate the age groups of giant pandas.
To conserve and monitor giant pandas, relevant government agencies have deployed a large number of infrared cameras in the wild, which provides a foundation for noninvasive monitoring systems.
With the development of image processing technology, the improvement of the quality of image acquisition equipment, and the upgrading of image transmission methods, image-based age group recognition technology for giant pandas have become possible. In addition, the data used for age group classification are low-cost, noninvasive, safe, easy to collect, and not easily affected by natural factors. Therefore, this technology may solve the limitations of the age group classification and population structure of wild populations of giant pandas. In this paper, a deep learning method was developed to study the distinguishability of giant panda faces for age group classification. The experimental results show that giant pandas' faces contain some age information, and it is feasible to identify the age group of giant pandas by analyzing facial images. To our knowledge, there is no precedent for estimating the age groups of giant pandas. It is important to note that the availability of data captured by field cameras is limited. This paper mainly introduces the dataset acquired from captive pandas.
This noninvasive approach is expected to provide technical support for the giant panda surveys and population monitoring.

| Data collection
By using a Panasonic dvx200 video camera and three digital cameras (Canon 1DXmarkII, Canon 5DmarkIII, and a Panasonic Lumix DMC-GH4), the image data of 218 captive pandas were collected. uniform selection conditions, we chose the face of the giant panda as the focus of the study. Since faces contain many unique attributes and have been previously applied in facial age recognition (Antipov et al., 2017;Levi & Hassncer, 2015;Samek et al., 2017) and individual panda recognition studies Hou et al., 2019Hou et al., , 2020.
We collected 6441 images of 218 pandas of different ages following Hu (2011), including 64 juvenile giant pandas (<1.5 years old) with 784 images, 97 subadult giant pandas (1.5-5.5 years old) with 1294 images, 121 adult giant pandas (5.5-20 years old) with 4035 images, and 14 old giant pandas (20-27 years old) with 328 images. Figure 2 shows samples from four age groups, simple samples that can be recognized by human vision and hard samples that may not be recognized. It is really hard to distinguish the age of giant pandas based on pictures for human vision, and it is challenging enough to estimate the age groups. Figure 3 shows the number of images for each individual in each age group, where the x-axis represents the number of images, and the y-axis represents the number of individuals containing the corresponding number of images. It is worth noting that some individual data span multiple age groups.

| Image annotation
Based on the background of the above-mentioned dataset, we have performed regional annotations on the filtered and sorted images, which can be used for subsequent tasks such as target detection. The research area of this paper is the face of the giant panda, so the marked area is the facial features (i.e., ears, eyes, nose, and mouth) and face of the giant panda. Firstly, manual annotation was carried out through the

| Network and architectures
Many pandas in different age groups have very similar appearances and cannot be easily distinguished by the human eye (Figure 2e-h).
To learn the advanced features that may not be obvious in human vision, we needed to build a deep model to extract the features.
For this, we used a Convolutional Neural Network (CNN;LeCun et al., 1998) and Residual Learning (He et al., 2016a).
For giant panda images, a deep neural network needed to be constructed to extract features at a deep level; however, the training of deep neural networks faced network degradation problems. Residual learning effectively solved this problem and can train neural networks with more than 1000 layers (He et al., 2016a). In the case that the network is getting deeper and deeper, the residual connection and identity mapping in the residual block (He et al., 2016b) can effectively avoid gradient disappearance and gradient explosion, and it is easier to optimize and improve the accuracy. Therefore, in this paper, ResNet (He et al., 2016a) was introduced as one of the experimental models for the age group classification of giant pandas. In addition, in order to further improve the accuracy and speed of the network, it is necessary to balance the three dimensions of width, depth, and resolution, EfficientNet (Tan & Le, 2019) was also introduced in this paper as an experimental model for the age group classification of giant pandas.
For this study, we used a modified ImageNet pretrained model to fit our tasks of age group classification workflow ( Figure 6). The last fully connection layer of the experimental model was replaced by different numbers of neurons according to different tasks.
According to the age group division of giant pandas, we set the age group classification of giant pandas as a multi-classification problem, so there were four neurons connected to the fully connection layer of the feature extractor. The features were mapped to the scores of each category through the fully connected layer, the goal was to enlarge the score of corresponding categories, and the loss function was defined as: where N is the number of samples, m is the number of categories, and the subscript j represents the index position in the real class label. This loss function, known as the cross-entropy loss function, calculates the resulting loss by comparing the predicted probability of a category with the true value after the model has produced a predicted value, and then sets a penalty term in logarithmic form based on this loss. When training the model, the cross-entropy loss function is used to minimize the loss, i.e., the smaller the loss the better the model. When we use this loss, we will train a CNN to output a probability over the m classes for each image. It is used for multi-class classification.
As the focus of this study was giant panda facial images, in order to verify the feasibility of the deep learning technology, we first crop the face according to the data annotation ( Figure 5). The purpose was to reduce the influence of the background and let the model pay more attention to the learning of the foreground. Figure 6, we first need to process the original image of the giant panda, i.e., truncate the face region of the giant panda ( Figure 5), then input the face of the giant panda into the CNN model, and finally output our prediction results.

| Model training
Since some individual data span multiple age groups, i.e., images of the same panda can appear multiple times, but at different ages and facial image features contain individual information (Kumar  Table 1. Although the current dataset is the largest panda face dataset, it is still small compared with training for deep learning methods. To alleviate this problem, we used data augmentation in the training set, including random cropping, random flipping, and random changes in image brightness, contrast, and saturation ( Figure 7). In addition, we tried to use Gaussian noise and random erasure operations to improve the model generalization, but through experiments, we found that these two data augmentation methods did not improve the model performance, so we did not use these two methods.   (Ruder, 2016). SGD is widely used by researchers around the world because it is good at finding flat minima that are relevant to generalization. Therefore, in this paper, SGD with momentum of 0.9 was adopted to accelerate SGD in the relevant direction and dampen oscillations, the weight decay was set to 5e-3. The initial learning rate was set to 1e-2 for EfficientNet-B0 and 2e-3 for ResNet and decayed by 0.3 every 6 epochs, the total epochs were 30, and the batch size was 16.
Except for the VGG, the learning rate was 1e-2 and decayed by 0.1 every 6 epochs, and the weight decay was set to 3e-3, the rest of the network configurations were consistent with the above.
To ensure the reproducibility of the methods, we used the same random seed for all. In all experiments, we used Accuracy, Mean Absolute Error (MAE), and F1-Score as experimental evaluation criteria. where N is the number of samples, y i is the true label, ŷ i is the predicted label, and I( ⋅ ) is the indicator function, i.e., the output value is 1 when the input is true, otherwise the output value is 0.
Mean Absolute Error is used to measure the average absolute error between the predicted and true values, and a smaller MAE indicates a better model, which is defined as: The F1-Score combines the precision and recall of a classifier into a single metric by taking their harmonic mean. In the binary classification task, F1-Score can be defined as: where Precision indicates the number of actual positive samples among those predicted to be positive, which is defined as: Recall indicates the proportion of samples that are positive that are judged to be positive, which is defined as:

F I G U R E 4 Examples of automatic
annotating. The automatic annotation tool annotates the eyes, nose, ears, mouth, and the whole head of the giant panda, respectively. where TP is the number of correct positive predictions; FP is the number of incorrect positive predictions; FN is the number of incorrect negative predictions.

F I G U R E 5
Since this experiment is a multi-classification task, we replace ity that a randomly chosen positive example is ranked higher than a randomly chosen negative example. Since our experiment was a multi-classification task, we replace AUC with a Macro-average ROC curve area (Fawcett, 2006) just like Marco-F1. The larger the area it represents, the higher the accuracy of the model.
The macro-average ROC curves of EfficientNet-B0 are given in

F I G U R E 7
Example of raw and augmentation images. The first row is raw images, and the second row is augmentation images.
In order to further explain and understand the model's focus on each age group, we employed the Grad-CAM++ algorithm (Chattopadhay et al., 2018) on the final convolutional layer to localize and highlight the discriminative regions. Grad-CAM++ is an upgrade of Grad-CAM (Selvaraju et al., 2017), which is a classdiscriminative localization method. It assigns a score to each class using the backpropagation-based filter gradient and convolution activation values. It provides a way to look into what particular parts of an image influence the model decision. In order to determine the decision region of the giant panda's age, we artificially classified and quantified the decision region of the panda's age into five categories: left eye, right eye, between two eyes, nasal bridge area, and others according to the output region of Grad-CAM++ (Figure 11), and it can be seen that the decision region of the giant panda's age is mainly concentrated between the two eyes of the giant panda ( Figure 12). These new findings will be further confirmed by biological and wildlife researchers.

| D ISCUSS I ON AND CON CLUS I ON
The age distribution of giant pandas is a key factor in understanding the population dynamics of the giant pandas in the wild, especially in the case of small, isolated populations that are at higher risk of local extinction (Wei et al., 2011;Xia & Hu, 2006 F I G U R E 9 Confusion matrix of EfficientNet-B0.

F I G U R E 10
Visualization of the feature spaces defined by the feature learned by the proposed model. Black represents the negative samples, while other colors indicate positive samples, × represents juveniles, ★ represents subadults, ■ represents adults, and represents elders.
between the eyes and nose of the giant panda to the area between the two eyes.
In this experiment, we found that while the classification accuracy of juvenile, subadult, and adult age groups was high, the classification accuracy of elderly giant pandas was not and the system often mischaracterized elderly pandas as adults. By analyzing the distribution of database samples, we speculated that the data imbalance may lead to the poor classification accuracy of older pandas. To address the problem of imbalanced data of giant pandas, we adopt cost-sensitive re-weighting methods to assign different weights to the samples to adjust their importance, considering that the resampling method tends to lead to the overfitting of the model. In this paper, we tried Focal Loss (Lin et al., 2017), VS Loss (Kini et al., 2021), and IB Loss (Park et al., 2021) to balance the differences between elderly group data and other categories. However, we were surprised to find that the classification accuracy of the elderly group did not improve significantly with these methods. Therefore, we further speculated that the low classification accuracy of the elderly group may also be due to the improved living conditions and the longer life span of captive giant pandas compared with wild individuals.
In the wild, the life expectancy for giant pandas is 20 years, while the oldest captive panda lived to be 38 years old (Colin, 2020;Song et al., 2006;Zhao et al., 2017 At the same time, we will further study the age group prediction of giant panda images in the wild.

F I G U R E 11
Quantitative results for the decision area. The decision area contains the left eye, the right eye, between the eyes, the nasal bridge area, and others.

F I G U R E 1 2 Visualization results via
As infrared cameras are widely used in conservation research fields, such as population assessment, animal resource investigation, and human-animal conflict (Carthew & Slater, 1991;Cutler & Swann, 1999;Karanth et al., 2004;Li et al., 2010;Martorello et al., 2001), they have become an increasingly powerful tool for wildlife detection around the world. A large number of images of wild animals are being collected, which can provide a reliable and sufficient data source for wildlife image recognition research. It is beneficial to the development and application of computer identification technology in the field of biology in the future, and can effectively solve the problem of inadequate utilization of monitoring data in protected areas, as well as save a lot of manpower and material resources.
At present, the combination of infrared camera data and computer graphics processing technology for further research is maturing (Brehar et al., 2021;Dai et al., 2021;Zhang & Rao, 2022;Aji et al., 2022). We expect to introduce a better recognition algorithm in the near future that will play a greater role in ecological and behavioral research of endangered and rare species, such as the giant panda, and provide a solid foundation for improving wildlife conservation and management.

CO N FLI C T O F I NTE R E S T
None of the authors have any conflict of interest to declare.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available in "Image data set based on the age of giant pandas" at https://doi. org/10.5061/dryad.m63xs j43n.

E TH I C A L A PPROVA L A N D CO N S ENT TO PA RTI CI PATE
The methods, use of materials, and all experimental procedures